Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Natural language morphology integration in off-line Arabic optical text recognition.

Identifieur interne : 000365 ( Main/Exploration ); précédent : 000364; suivant : 000366

Natural language morphology integration in off-line Arabic optical text recognition.

Auteurs : Slim Kanoun [Tunisie] ; Adel M. Alimi ; Yves Lecourtier

Source :

RBID : pubmed:20889434

English descriptors

Abstract

In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).

DOI: 10.1109/TSMCB.2010.2072990
PubMed: 20889434


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Natural language morphology integration in off-line Arabic optical text recognition.</title>
<author>
<name sortKey="Kanoun, Slim" sort="Kanoun, Slim" uniqKey="Kanoun S" first="Slim" last="Kanoun">Slim Kanoun</name>
<affiliation wicri:level="1">
<nlm:affiliation>REsearch Group on Intelligent Machines (REGIM), National School of Engineers, University of Sfax, 3038 Sfax, Tunisia. slim.kanoun@yahoo.fr</nlm:affiliation>
<country xml:lang="fr">Tunisie</country>
<wicri:regionArea>REsearch Group on Intelligent Machines (REGIM), National School of Engineers, University of Sfax, 3038 Sfax</wicri:regionArea>
<wicri:noRegion>3038 Sfax</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Alimi, Adel M" sort="Alimi, Adel M" uniqKey="Alimi A" first="Adel M" last="Alimi">Adel M. Alimi</name>
</author>
<author>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2011">2011</date>
<idno type="doi">10.1109/TSMCB.2010.2072990</idno>
<idno type="RBID">pubmed:20889434</idno>
<idno type="pmid">20889434</idno>
<idno type="wicri:Area/PubMed/Corpus">000038</idno>
<idno type="wicri:Area/PubMed/Curation">000038</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000038</idno>
<idno type="wicri:Area/Ncbi/Merge">000088</idno>
<idno type="wicri:Area/Ncbi/Curation">000088</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000088</idno>
<idno type="wicri:Area/Main/Merge">000370</idno>
<idno type="wicri:Area/Main/Curation">000365</idno>
<idno type="wicri:Area/Main/Exploration">000365</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Natural language morphology integration in off-line Arabic optical text recognition.</title>
<author>
<name sortKey="Kanoun, Slim" sort="Kanoun, Slim" uniqKey="Kanoun S" first="Slim" last="Kanoun">Slim Kanoun</name>
<affiliation wicri:level="1">
<nlm:affiliation>REsearch Group on Intelligent Machines (REGIM), National School of Engineers, University of Sfax, 3038 Sfax, Tunisia. slim.kanoun@yahoo.fr</nlm:affiliation>
<country xml:lang="fr">Tunisie</country>
<wicri:regionArea>REsearch Group on Intelligent Machines (REGIM), National School of Engineers, University of Sfax, 3038 Sfax</wicri:regionArea>
<wicri:noRegion>3038 Sfax</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Alimi, Adel M" sort="Alimi, Adel M" uniqKey="Alimi A" first="Adel M" last="Alimi">Adel M. Alimi</name>
</author>
<author>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
</author>
</analytic>
<series>
<title level="j">IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society</title>
<idno type="eISSN">1941-0492</idno>
<imprint>
<date when="2011" type="published">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Artificial Intelligence</term>
<term>Automatic Data Processing (methods)</term>
<term>Image Enhancement (methods)</term>
<term>Image Interpretation, Computer-Assisted (methods)</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Natural Language Processing</term>
<term>Pattern Recognition, Automated (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Automatic Data Processing</term>
<term>Image Enhancement</term>
<term>Image Interpretation, Computer-Assisted</term>
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Artificial Intelligence</term>
<term>Natural Language Processing</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Tunisie</li>
</country>
</list>
<tree>
<noCountry>
<name sortKey="Alimi, Adel M" sort="Alimi, Adel M" uniqKey="Alimi A" first="Adel M" last="Alimi">Adel M. Alimi</name>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
</noCountry>
<country name="Tunisie">
<noRegion>
<name sortKey="Kanoun, Slim" sort="Kanoun, Slim" uniqKey="Kanoun S" first="Slim" last="Kanoun">Slim Kanoun</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000365 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000365 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     pubmed:20889434
   |texte=   Natural language morphology integration in off-line Arabic optical text recognition.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:20889434" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024